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Abstract Bayesian filtering aims at tracking sequentially a hidden process 
from an observed one. In particular, sequential Monte Carlo (SMC) tech- 
niques propagate in time weighted trajectories which represent the posterior 
probability density function (pdf) of the hidden process given the available 
observations. On the other hand, Conditional Monte Carlo (CMC) is a vari- 
ance reduction technique which replaces the estimator of a moment of interest 
by its conditional expectation given another variable. In this paper we show 
that up to some adaptations, one can make use of the time recursive nature 
of SMC algorithms in order to propose natural temporal CMC estimators of 
some point estimates of the hidden process, which outperform the associated 
crude Monte Carlo (MC) estimator whatever the number of samples. We next 
show that our Bayesian CMC estimators can be computed exactly, or approx- 
imated efficiently, in some hidden Markov chain (HMC) models; in some jump 
Markov state-space systems ( JMSS) ; as well as in multitarget filtering. Finally 
our algorithms are validated via simulations. 

Keywords Conditional Monte Carlo • Bayesian Filtering • Hidden Markov 
Models • Jump Markov state space systems • Rao-Blackwell Particle Filters • 
Probability Hypothesis Density. 



1 Introduction 

1.1 SMC algorithms for single- or multi-object Bayesian filtering 

In single object Bayesian filtering we consider two random processes {X„}„>q 
and {Yn}n>o with given joint probability law. Yi is observed, i.e. we have at 
our disposal realizations yo.„ = {yi}2=o of Yo:„ = {Yi}f-^Q (as far as notations 
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are concerned, upper case letters denote random variables (r.v.), lower case 
ones their realizations, and bold letters vectors; p{x), say, denotes the pdf of 
r.v. X and p{x\y)^ say, the conditional pdf of X given Y = y; ii i < j Pi;j\n 
is a shorthand notation for p{xi:j\yo:7i)] if are samples from p{x) then the 
set {x^jfLi can also be denoted x^-^; subscripts are reserved for times indices 
and superscripts for realizations). Process {^n} is hidden, and our aim is to 
compute, for each time instant n, some moment of interest 

0,l= y" /(X0:„)p(x0:„|y0:n)dx0:„ (1) 

of the a posteriori pdf p(xo:„|yo:n) of Xo:„ given yo:n. Unfortunately, in most 
models (1) cannot be computed exactly. Suboptimal solutions for computing 
On include SMC techniques [1] [2], which propagate over time weighted trajec- 
tories {x^.„, wl^}fL^ with K = 1- In other words, po.„|„ = X^^Ii <^xf, „ > 
in which 6 is the Dirac mass, is a discrete (and random) approximation of 

P(XO:„|yO:n)- 

On the other hand, muJti-object filtering (see e.g. [3]) essentially reduces 
to computing = / /(x„)?;„(x„)dx„ in which Vni^n) is now the so-called 
Probability Hypothesis Density (PHD), i.e. the a posteriori spatial density 
of the expected number of targets, given all measurements (be they due to 
detected targets or to false alarms). Again, SMC techniques propagate an 
approximation of with a set of weighted samples {x^, here "^lu 

which in general is different from 1, is an estimator of the number of targets. 

Now, SMC algorithms, be they for single- or multi-object Baycsian filtering, 
usually focus on how to propagate approximations po:n|ri (or ^n) of Po:ri|Ti (or 
w„); once Po:ri|Ti or has been computed. On is finally estimated either as 

= Ezli ^n/(xo:n) Eili By coutrast, in this paper we directly 

focus on On itself, and see under which conditions one can improve this point 
estimator at a reasonable computational cost. 

1.2 Variance reduction via conditioning: Rao-Blackwellizcd particle filters 
(RB-PF) 

This problem leads us to variance reduction techniques which form an impor- 
tant part of computer simulation (see e.g. [4]). Among them, methods based 
on conditioning variables rely on the following well known result. Let Xi and 
X2 be two r.v. and / some function. Then 

E(E(/(X2)|Xi)) =E(/(X2)), (2) 
var(E(/(X2)|Xi)) - var(/(X2)) - E(var(/(X2)|Xi)). (3) 

So if the aim is to compute O ~ E(/(A'2)) and we have at our disposal 
{Xl}fLi Pixi), {^2}ili V{x2) then the so-called CMC estimator 
O = jfY^^=i'^ifi'^'^)\'^i) lower variance than the corresponding crude 
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MC one = ^iLi f{^2)- Of course, the interest of O vs. depends on 
the choice of Xi: ideally, one should easily sample from p{xi); the variance 
reduction in (3) should be as large as possible; but in the meantime function 
g{xi) ~ E(/(X2)|xi) should remain computable at a reasonable computational 
cost. 

Variance reduction techniques based on CMC methods have been adapted 
to Bayesian filtering; in this context, these methods are either known as 
marginalized or RB-PF [5] [6] [7] [8]. The rationale is as follow. Let now 6>„ 
in (1) be rewritten as 6* = E(/(Xi,X2)). It is usually not possible to sample 
from p(xi,X2), and often p{xi,X2) <x. p'(xi^X2) is only known up to a con- 
stant, whence the use of Bayesian (or normalized) importance sampling (IS) 
techniques [9]. So let 

N 

J2^U^l^,^2'')fi^\.^2) With (.t1,X^) ^ 92, (4) 
1=1 
N 

Y,<{^i''mf{^l,X2)\x\) with x\ ^ gi, (5) 

i=l 

with J2^=i^i = J2iLi^2 = 1- Estimator 6*^^ depends on samples {x\}f^j^ 
only and is known as the RB estimator of 0. However 0^^ is known to out- 
perform only under specific assumptions on gi, q2, w}'^ and Wj'^. In par- 
ticular, if w\ (X w"'* = p' {x\) / qi{x\) , wl cx lUj'* = p'{x\,X2)/q2{x\,X2) and 
qi{xi) = J 92(2:1, a;2)dx2, then the variance of w"'^ can only lower than that of 
W2'^ [6]. If moreover {x\, x\) are independent, an asymptotic analysis based on 
(2) and (3) proves that 6*^^ indeed outperforms [7]. However, independence 
never holds in the presence of resampling; in the general case, the compari- 
son of both estimators depends on the choice of the importance distributions 
qi and q2, and can be proved (asympotically) only under specific sufficient 
conditions [10] [11]. 

RB-PF have been applied in the specific case where the state vectors Xom 
can be partitioned into a "linear" component X2 = Xq.„ and a "non-linear" 
one Xi = Xq.'„. Models in which computing 0^^ is possible include linear 
and Gaussian JMSS [7] [5] or partially linear and Gaussian HMC [8] . In other 
models, it may be possible to approximate 0^^ by using numerical approxima- 
tions of 'Wi{x) and of E(/(Xi, X2)|a;i). However, due to the spatial structure 
of the decomposition of xo:„, approximating 0n in (1) involves propagating 
numerical approximations over time. 

1.3 Bayesian CMC estimators 
1.3.1 Spatial vs. temporal RB-PF 

In this paper we propose another class of RB-PF; the main difference is that 
our partitioning {Xi,X2) of xo:„ is now temporal rather than spatial. The 
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question arises naturally in the Bayesian filtering context: at time n we usually 
build 0„ from po:n|n: but indeed po:n-i|ra-i was also available for free since, 
by nature, sequential MC algorithms construct po:n\n from po:ri-i|n-i- Now, 
comparing with spatially partitioned RB-PF, a temporal partition of Xo:„ has 
a number of statistical and computational structural consequences, as we now 
see. So let again 



= J f{Xi,X2)p{Xi,X2)dxidx2 



f{xi,X2)p{x2\xi)dX2 



p{xi)dxi. 



Let us start from the following approximation of p{xi): 



N 



(6) 
(7) 

(8) 



For 1 < i < let next X2 ~ p{x2\x\). This yields the following approximation 
of p{xi,X2): 

N 

Pixi,x2) = J2'^'i^i^)^{xi,xiy. (9) 

i=l 

note that each weight may depend on {x\}fLi, but not on {xlJfLi. The 
reason why is that we now use a temporal partition, and not a spatial one: in 
the spatial subdivision case, p{x2\xi) would reduce to p(xq.„|xq.'„, yom), which 
means that we would need to sample at each time step the whole set {^o^n}iLii 
instead of simply extending the trajectories. 

Finally we have two options: computing the full expectation in (6) by using 
(9), or only the outer one in (7) by using (8). So let 



N 



0(xr,xr) = E^'(-i^'')/(^i'^2), 

1=1 

^ r /• 

©(X^) = J f{x\,X2)p{x2\x\)d, 



(10) 
(11) 



In this paper, we shall call 0{x.\-^ ,x.2^) (rcsp. ©(x}-^)) the Bayesian crude 
MC (resp. Bayesian CMC) estimator of 0. 

1.3.2 Discussion 

Let us now compare Q to 0. As in section 1.2, outperforms for all 
but not for the same reasons. Indeed we have 

E(^'(X}^^)/(Xi,X^)|x}^^) =7«Xx}^^) / f{x\,X2)p{x2\x\)dX2. (12) 
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So from (3), the variance of each term of (11) is lower than or equal to that of 
the corresponding term in (10); however this is not sufficient to conclude that 
var(6') < var(6') since the terms may be dependent. Fortunately (12) implies 
that = E(e|xJ^^), so e is preferable to €>, due to (2) and (3). 

Let us now turn to practical considerations. Of course, is of interest only 
if the conditional expectation in (11) can be computed easily. In the rest of 
this paper we will see that this indeed is the case in some Markovian models 
and for other models, we will propose and discuss some approximations which 
make the Bayesian CMC estimator a tool of practical interest for practitioners 
which may be used as an alternative to purely Monte Carlo classical PF. From 
a modeling point of view, by contrast with spatially partitioned RB-PF, the 
state space no longer needs to be multi-dimensional; here a key point is the 
availability (and integrability) of p(x2|a;i), which, in the temporal partitions 
considered below, will coincide with the so-called optimal conditional impor- 
tance distribution. From a numerical point of view, another interesting feature 
of sequential RB-PF is that numerical approximations, when necessary, do not 
need to be propagated over time. 

Let us finally address complexity. As we shall see, in some cases can 
even be computed under the same assumptions and for the same computa- 
tional cost as (see sections 2.2.1 and 3.2.2). Also one should note that 
in the partition (Xi,X2) of a given set of variables (Xo:ri, say) Xi should 
be as small as possible. More precisely, let = E(/(Xi,X2,X3)) and let 
p{xi,X2) be available. Then two Bayesian CMC estimators can be thought 
of : 0^^ built from = E[E(/(Xi, X2, Xa)!^!, X2)], in which the inner 
expectation (w.r.t. X^) is computed exactly, and ©(-^a.^a) built from = 
E[E(/(Xi,X2,X3)|X0] and from p{xi). Estimator ©(^^.Xa) preferable to 
0"^^ , but computing 0(^2,^3) requires an additional exact expectation compu- 
tation, since E(/(Xi, X2, X3)|Xi) = E[E(/(Xi, X2, Xg)!^!, X2)]. As we shall 
see in section 3.2.2, in some Markovian models both estimators can indeed be 
computed; and computing 0(^2,-^3) only requires an additional computational 
cost. 

The rest of this paper is organized as follows. First in section 2 we see that 
in some HMC models (including the Autoregressive Conditional Heteroscedas- 
ticity (ARCH) ones), a Bayesian CMC estimator 0„ can replace the classical 
one 0n in the case where the sampling importance resampling (SIR) algo- 
rithm with optimal importance distribution is used. In Section 3 we develop 
our Bayesian CMC estimators for JMSS; in section 3.1 we address the linear 
and Gaussian case, where our solution can be seen as a further (temporal) 
RB step of an already (spatial) RB-PF algorithm; in section 3.2 we develop 
Bayesian CMC estimators for general JMSS. Finally in Section 4 we address 
a multi-target scenario and adapt Bayesian CMC to the PHD filter. In all 
these sections we propose relevant approximate estimators when the Bayesian 
CMC estimator cannot be computed exactly, and we validate our algorithms 
via simulations. We finally end the paper with a Conclusion. 
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2 Bayesian CMC PF for some HMC models 

2.1 Deriving a Bayesian CMC estimator On 

Let {X„}„>o (resp. {Y„}„>o) be ap- (resp. q-) dimensional state vector (resp. 
observation). We assume that (X„, Y„) follows the well known HMC model: 

n n 

p(xo:„,yo:«) =p(xo) J|/,|i_i(xi|x,_i) J|gi(yi|x,), (13) 

in which /i|i_i(xi|xi_i) is the transition pdf of Markov chain {X„}„>o and 
gi{yi\xi) the likelihood of y^ given x,;. The Bayesian filtering problem consists 
in computing some moment of interest On = Ep^|^ (/(X„)), which we rewrite 
as 

On = J ./(x„)p(xo )dxo:„_idx„. (14) 

So (14) coincides with (6), with Xi — Xo:„-i, X2 = X„, f{xi,X2) depends on 
X2 only, and p{xi,X2) is the a posteriori (i.e., given yo:n) joint pdf 

p(xo:„-i,x„|yo:„) =p(xo:„_i|yo:,i)p(x„|x„_i,y„) . (15) 

" V V ' 

p{xi) p(x2\xi) 

According to (8) we first need an approximation of p{xi), which in model (13) 
reads: 

, I N p(yn|x„_i)p(xo:„-l|yo:n-l) 

P(xO:„-l|yO:«) = -7^-7 : r-, : , (16) 

J p(y„ |x„_l)p(XO:„-l |yO:,i-l)dXo:„_l 

in which p(y„|x„_i) = / 5„(y„|x„)/„|„_i(x„|x„_i)dx„. On the other hand, 
PF algorithm propagate approximations of po:n-i|n-i or of p„-i|„-i. So let 
us start fromp(xo:,i-i|yo:ri-i) = X]il=i '"^n-i'^xj J - According to Rubin's SIR 



mechanism [12] [13] [14] p(xo:„-i|yo:„) = Sili "'n-i^x- where w'. 



0:n-l ' 



n-1 



OC 



w^_iP(yn|x„-i), is an approximation of p(xo:„-i|yo:n). Next p{x2\xi) in (15) 
coincides with the so-called optimal conditional importance pdf, i.e. the impor- 
tance density p(x„ I x„_i,y„) cx ffn(yn|x„)/„|„(x„|x„_i) which minimizes the 
conditional variance of weights w^, given past trajectories and observations 
[15] [16] [17] and [6]. This leads to the so-called SIR algorithm with optimal 
importance distribution and optional resampling step: 

SIR algorithm. Let po:„_ii„_i = J2iLi ^n-i^x'g ^ be an MC approxi- 
mation of Po:n-l|n-l- 

1. For ah i, 1 < i < N, sample x^ p(x„lx^_;^, y„); 

2. For al\i,l<i< N, set cx u;,\_ip(y„lx5,_i), J2f:=i ^« = 1; 

3. (Optional). For all i, 1 < i < N, (re)samplc Xo.„ ^ St=i ^n'^[xj ^.x^j 
and set = -i-; otherwise set (x^,zi;^) = (x^,t«Jj). 
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This third resampling step is usually performed only if some criterion holds, 
and aims at preventing weights degeneracy, see e.g. [1], [2]. Then 



N 



i^?^„=E<'^-0:„-.X, (17) 
1=1 

is a (SIR-based) SMC approximation of Po:n\ni and „|„ plays the role 

of ;3(a;i, 0:2) in (9). Finally from (10) and (11), the SIR-based crude and CMC 
estimators of moment 0„ defined in (14) are respectively 

N 



e^i^(xj:^i,5,r) = E<(^o;,li)/(x^), (18) 

1=1 

^ r 

Ol'^'i^^u-i) = E<(^o;n-i) y/(x„)p(x„|xj,_i,y„)dx„. (19) 



i=l 



2.2 Computing 0„ in practice 
2.2.1 Exact computation 

From (12) we know that 0^^^ outperforms ©fj^^ ; but 6*^^^ can be used only if 
•wl^ and integral / /(x„)p(x„|x5j_]^, y„)dx„ can be computed. As we now see, 
this is the case in some particular HMC models and for some functions /(.). 
Let us e.g. consider the semi-linear stochastic models with additive Gaussian 
noise, given by 

X„ = f„(X„_i) + K„(X„_i) X U„, (20) 
Y„-H„X„+V„, (21) 

in which {U„} and {V„} are i.i.d., mutually independent and independent of 
Xo, U„ - 7V(0; I) and V„ - Af{0] R^^). The one-dimensional ARCH model is 

one such model with /„(a;„_i) = 0, fc„(a;„_i) = Po + Pi^n-i and H„ = 1. In 
model (20) (21) p(x„|x„„i, y„) and p(y„|x„_i) are Gaussian. More precisely, 
let Q„(x„_i) = K„(x„_i)K„(x„^i)^; then 

L„(x„_i) = H„Q„(x„_i)H^ + R;;, (22) 
m„(x„_i,y„) = f„(x„_i) + Q„(x„_i)H^L~^(x„_i)(y„ - H„f„(x„_i j(^3) 
Pn(x„-i) = Q„(x„_i) - Q„(x„_i)H^L,7^(x„_i)H„Q„(x„_i), (24) 
p(x„|x„_i,y„) =7V(x„,m„(x„_i,y„),P„(x„_i)), (25) 
p(yji|x„-i) = 7V(y„,H„f„( (x„_i)). (26) 

Finally in such models the Bayesian CMC estimator 6*^^^ is workable for some 
functions /(.). If /(x) is a polynomial in x, the problem reduces to computing 
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the first moments of the available Gaussian pdf (25). In the important par- 
ticular case where /(x) = x (used to give an estimator of the hidden state), 
no further computation is indeed necessary; in this case the integral in (19) is 
equal to m„(x5j_i, y„). 

Remark 1 In this class of models, computing or 6*^^^ requires the same 
computational cost if /(x) = x. Both estimators indeed compute the param- 
eters m„(x^_;^,y„) and P„(x*j_i) of p(x„|x^_;^, y„), and use these pdfs to 
sample the new particles x^, which in both cases are needed for the next 
time step. The only difference is that O'^^ = X]i=i'"^n^rii while = 
E.!Ii<m„(x;_i,y„)- 

2.2.2 Approximate computation 

Let us now discuss cases where the Bayesian CMC estimator 0^^ cannot be 
computed exactly because p(y„|x„_i) and/or moments of p(x„|x„_i, y„) are 
not computable. Two approximations are proposed: 

— Available techniques such as local linearizations [6], Taylor series expan- 
sion [18] or the Unscented Transformation (UT) [19] have already been 
proposed for approximating p(y„|x„_i) and a moment of p(x„|x„_i, y„), 
so one can use any of them in (19). The resulting algorithm can be seen 
as an alternative to solutions like the Extended Kalman Filter (EKF) or 
the Unscented Kalman filter (UKF), where we look for approximating the 
filtering pdf by a Gaussian and which rely on linearizations or the UT; 
or to SMC methods, where we look for a discrete approximation of p„|„. In 
our approximate Bayesian CMC technique, we start from a discrete approx- 
imation of p„_i|„_i produced by an SMC method, then similarly to the 
EKF/UKF, we look for a numerical approximation of ©n, given that dis- 
crete approximation of p„_i|„_i. However, deriving a good approximation 
of p(y„|x„_i) can be an intricate issue, so we next look for approximations 
which do not rely on an approximation of p(y„|x„„i). 

— In the SIR algorithm used so far, xj, is drawn fromp(x„|x^_j^,y„), whence 
a weight update factor equal to p{yn\'x.\_i). On the other hand, sampling 
xjj from an alternate (i.e., not necessarily optimal) pdf q(x„|x5j_]^) yields an 
approximation of po:n-i|n given by Pom-ilTi = '"^n'^xj where weights 
wl^ are now proportional to <_i/„|„_i(x^|x^_i)5„(y„|x^)/g(x5,|x^_i), 
and so depend also on the new samples {x^}^^. In that case, the associated 
Bayesian CMC and crude estimators become 

N 

^"(^Oin'-l' ^ri^)=^^ '^n(^0:ri'-li ^n^)/(^n)i (27) 

1=1 

e„(xJ:^_,,Si^^)=^<(xJ:^_i,Xi^^)//(x„)p(x„|x;_i,y„)dx„,(28) 
i=i •' 



Title Suppressed Due to Excessive Length 



9 



which can no longer be compared easily (it was the case in section 1.3, 
because the weights in in (10) and in (11) depend on {x\}^i 
only). On the other hand, the computation of (28) does not require that of 
p(yn|x„_i), but only that of / /(x„)p(x„|x5j_i, y„)dx„. This is of inter- 
est in some models where approximating p(y„|x„_i) may be challenging 
because of the form of gn, while the first order moments of p{xn\xn-i, y-a) 
can be computed or approximated easily [18]. 

The two approximate implementations of the Bayesian CMC estimator which 
we just discussed will be compared via simulations in section §2.4.3. 



2.3 Alternate Bayesian CMC solutions 

2.3.1 A Bayesian CMC estimator based on the fully-adapted auxiliary 
particle filter (FA) 

The SIR algorithm of section 2.1 is not the only SMC algorithm which enables 
to compute an approximation of p(xo:„_i, x„|yo:„) in which weights depend 
on {x^^„_i},^i only. Starting from p(xo:„_i |yo:,i-i) = K-i'^xj,^^,,, the 

so-called FA algorithm [20] [21] is one such alternative: 

FA algorithm. Let Po:n-i|Ti-i = X^ili '"^n-i'^xji ^ be an MC approxima- 
tion of Po:«-l|n-l- 

1. For ah i, 1 < i < N, set w'^ cx <_ip(y„|x^^„_i), X)^! = 1; 

2. For ah i, 1 < i < N, sample x^,„_i - X^^^i ^nK},_^_,, 

3. For all i, 1 < i < N, sample xj^ ^ p(x„|x5j_j, y„) and set w\ = 

^0:ti ~ [^0:n-l! ^ni- 



ls the FA-based SMC approximation of PQ:n-i,n\m a-nd the FA-based crude 
and CMC estimators of 0n become respectively 



Finally 




(29) 




(30) 



i=l 
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2.3.2 Discussion 

Comparing with section 2.1, we see that two Baycsian CMC estimators are 
indeed available: the SIR-based one 0^^^ given by (19), and the FA-based one 
given by (31). The natural question which arises at this point is thus to 
wonder which one is best. Two arguments are available. 

Let us first start from a common MC approximation Po:n-i\n-i — EiLi 
<j-i'5xj,^„_i of po:„_i|„_i. Given po:«-i|«-i and y„, trajectories {x^:„_i}ili 
produced by the FA algorithm are i.i.d. from J^iLi ^n'^xj ^ . As is well known, 
resampling introduces variance, so given po:n-i|n-i '9^^ is preferable to 6*^^^, 
and should not be used in practice. 

On the other hand, the performances of Of^^ also depend on the weighted 
trajectories {(xq.„_j^, w'^n_i)}fLi which are available at time n — 1; so one can 
wonder whether one should propagate them via the SIR algorithm, or via the 
FA one. 

This actually is a thorny issue, because in the SIR algorithm the resampling 
step is optional and is often performed according to a particular criterion, like 
an estimator of the so-called number of efficient particles [16] [17]. So compar- 
ing the set {x'j, WnJ'j^i produced by the SIR algorithm before the resampling 
step to that {xjj, l/N}fLi produced by the FA algorithm, is a challenging task, 
and indeed it has been proved in [22] (from an asymptotical point of view) 
that none algorithm always outperforms the other. 

If however we assume that the resampling step is done at each time step, 
then it is well known [23, Ch. 9] that the set of samples produced by the FA 
algorithm is better (in an asymptotic normality sense) than that produced by 
the SIR algorithm after the resampling step. This can be easily understood 
empirically from a simple argument. Starting from a set of weighted samples 
{^0;n-iT''^n-i}iLij the number of different particles {xj^} produced by the FA 
algorithm is equal to N, while that produced by the SIR one (after resampling) 
is lower than N , and can consequently lead to a poor approximation of p„|„. 



2.4 Simulations 

In section 2.4.1 we compare via simulations two Bayesian CMC estimators 
0^^^, which differ only by the set of weighted points po:n-i|n-i upon which 
they rely at each time instant n — 1: this set will be either propagated by 
the SIR algorithm {Ol^^'^), or by the FA one (0^^^'^). In section 2.4.2 we 
compare and Of,^^'^ in the ARCH model. In section 2.4.3 we compare 
the two approximations of 0„ described in section 2.2.2, and the weighted 
trajectories are propagated by the SIR algorithm. We compute the empirical 
mean square error (MSE) at each time step, averaged on P = 200 simulations; 
the true mean is computed by the Kalman filter (KF) in the Gaussian case, 
or a bootstrap filter [24] with — 10^ particles otherwise. 
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2.4-1 Gaussian Model 

Wc first consider a linear and Gaussian model described by (20)-(21) where 
fn{xn-i) = 0.9a;„_i, Hn = 1, fc„(a;„_i) ~ ^/TO and R!^ = 1. We want to 
estimate the hidden state, so f{xn) = Xn- We compute the SIR- and FA- 
based Bayesian crude and CMC estimators with N — 1000 particles; of course 
KF, which computes Ep^|^(x) exactly, is here the benchmark solution. MSEs 

of the four estimators are displayed in Fig. 1. 6*^^^'^ (resp. 0^^'^) always 
outperforms Of-^^ (resp. 6*^^). Note also that 0f/^ does not always outperform 
Of^^, which is in accordance with the asymptotical analysis [22]; while 0^^^'^ 
always outperforms 6*^^^^'^. 



Fig. 1 MSE - Gaussian model, _R = 1, Q = 10, TV = 1000 - = x„. Estimator CMC- 

SIR-2 which propagates the samples with an FA algorithm outperforms CMC-SIR-1 which 
uses a SIR algorithm. Both CMC estimators outperform the crude ones. 



2.4.2 ARCH Model 

We next consider the ARCH model recalled in section 2.2.1 We set R^^ = 3, 
/3o = 1 and l3i = 0.1. We want to estimate Xn (so f{xn) = Xn), and the 
variance of the process noise (so f{xn) = l3o + Pix'^). Since p{x„\xn-i,yn) 
is Gaussian (see (25)), it is possible to calculate both moments. We compare 
(9^^(1000) and 9l^^''^{1000), both computed with = 1000 particles, and 
©^^^'^(lOO). MSEs are displayed on Fig. 2 for the estimate of Xn and Fig. 
3 for the variance of the process noise. As we see 0^^^'^(1OOO), and even 
^SiR,2^-[_00), both outperform 0,^^(1000). However the gap between the three 
algorithms is function dependent and so the previous considerations are model 
and function dependent. 

2.4.3 Stochastic Volatility Model 



Let us consider the following model: 

Xn+l = <PXn + Un 

r„ = /3exp(A„/2) X Vn 



(32) 
(33) 
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Fig. 2 MSB - ARCH Model - /3o = 1, /3i = 0.1 and = 3 - = x„. The CMC 

estimator with A'^ = 100 particles outperforms the crude one with N = 1000 particles and 
is close to the CMC one with N = 1000 particles. 




Fig. 3 MSE - ARCH Model - /3o = 1, /3i = 0.1 and -R^ = 3 - = /3o + /3i^^_i- The 

gap between the three estimators depends on function /(.) and has decreased compared to 
the previous simulation. 



in which Un ~ Af{Q,a'^) and Vn ^ J\f (0,1). In this model On is not com- 
putable, whatever function /, because p{yn\xn-i) is not computable. We pro- 
pose to compare two approximations of the Bayesian CMC estimator with 

sift, 1 

a SIR based crude estimator. Our first approximation 6'„ ' only relies on 
the approximation of J f{xn)p{xn\xn-i,yn)dxn (second item in §2.2.2) while 

the second one 0„ ' relies in addition on that of p(y„|a;„_i) (first item 
in §2.2.2). In this model, an approximation of p(y„|a;„_i) is obtained by a 
first order Taylor series expansion of function log((7n(2/n|a;ri)) in ^Xn-i- If the 
deduced approximation of gn{yn\xn) is noted ^„(j/„|a;„) then p(j/„|.T„_i) = 

n—1 {'^n I '^n—l )gn{yn\Xn)dXn whcrC /„| 

n—1 {'^n I '^n—l 

is now computable. If a is small, /„|„_i(2;„|2;„_i) is approximately non-null 
for values close to <?x„_i, and for such values dniUnlxn) is a good approxima- 
tion of gn{,yn\xn)- So One should get a good approximation p{yn\xn-i) when 
a is small. Finally, a deduced approximation of p(a;„|a;„_i, y„) is given by a 
Gaussian pdf, see [20]. 

We estimate the standard deviation of the observation noise at time n so 
/(x„) = /3exp(a;„/2). We first take ^ = 0.8, /3 = 0.6, a = 0.18. Resuhs are dis- 
played in Fig. 4. We observe that both approximations of the Bayesian CMC 
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Fig. 4 MSE - Stochastic Volatihty Model - <P = 0.8, /3 = 0.6, a = 0.18, TV = 1000. Both 
approximate CMC estimators outperform the crude one. 




Fig. 5 MSE - Stochastic Volatility Model - <? = 0.8, /3 = 0.6, a = 0.4, TV = 1000. Only 
the approximate estimator CMC-1 outperforms the crude estimator. This is because the 
approximation of p(yn|x,i-i) used by the approximate estimator CMC-2 is not reliable 
with these parameters. 



estimator outperform the crude SIR-based one, and that the second approx- 
imation, which does not use x^'^, is preferable. However, in Fig. 5 we take 
(T = 0.40. Remember that increasing a has consequences on the approximation 

of p{yn\xn-i)] as expected, 0„ ' , which rehes on this approximation, is out- 
performed by the two other estimators. It is particularly interesting to notice 
that the first approximation 6?„ ' is not affected and still outperforms the 
SIR based estimator. This confirms that Bayesian CMC estimators can still 
be of practical interest in models which are not semi-linear. 



3 Bayesian CMC algorithms for JMSS models 

As in §2 we still consider the estimation of On = / 0(x„)p(x„|yo:„)dx„ (the 
reason why we replaced / by </> will become clear a few lines below), but now 
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in a so-called JMSS: 

n n n 

p(ro:„,Xo:„,yo:„)=p(ro)]^p(ri|ri_i)p(xo)]^/,|i_i(xi|xi_i,ri)]^gi(yi|xj,ri). 

i=l i=l 1=0 

(34) 

Model (34) can be thought of as an HMC model, in which and gi depend 

on the realization of a discrete Markov Chain {i?„}„>o where each i?„ takes 
its values in {1, • • • , A"}. So now both Xn and i?„ are hidden, and 0„ can be 
rewritten as 

6)„ = ^ / 0(x„)p(x 

n ; l'0:n |yO:n 

)dx„. (35) 

ro:„ •' 

Note that 0(.) can also depend on r„. As is well known [25] [26] [7], in a JMSS 
exact Bayesian filtering is either impossible (in the general case) or an NP- 
hard problem (in the linear and Gaussian case) , so one has to use suboptimal 
techniques. Among them, SMC methods can be divided into two classes: 

— In the first class [27] [28] [23] On is computed by injecting an SMC approx- 
imation of p(xo:„,ro:„|yo:,i) into (35); 

— In the second class of SMC methods we start from 

On = ^ p{YO:n\yQ:n\ j <?!>(x„)p(x )dx„, (36) 

PF KF 

and propagate an SMC approximation '"'n'^rj of p(ro:n|yo:n) only; 

then On is computed as 

On = X!^" / (x,0 P (xn I r ^ „ , yo: „ ) dx„ , (37) 

in which p(x„|rQ.„, yom) is computed exactly via KF if model (34), con- 
ditionally on ro:„, is linear and Gaussian, i.e. if /i+i|i(xi_|_i|x,;, r^+i) and 
(7i(yi|xj;, r^) are Gaussian with means linear in x^ [7]. 



3.1 Bayesian CMC algorithms for linear and Gaussian JMSS models 
3.1.1 Deriving the Bayesian CMC algorithm 

In this section we begin with the second class of algorithms. Let us first see 
that (36) coincides with (6) (in which the integral is replaced by a sum, since 
Rn is discrete), up to the identification: Xi = Ro:„-i, X2 ~ Rm f{xi,X2) = 
J 0(x„)p(x„|ro:„-i,r„,yo:„)dx„, and p(a;i, 2:2) is the joint pdf 

P(l"Q:n|yO:«) = P(l"0:«-1 |yO:n) |l"0:n-l , YOin) ■ (38) 
V V ' 
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We need to compute both factors (we cannot simply apply the results of §2.1, 
because in (34) the marginal chain (i?„, Y„) is not an HMC, as was (X„, Y„) 
in (13)). We first need an approximation of p{xi), i.e. of 

p(ro:„-i|yo:„) = — . . r. (39) 

^ro:„-i?'(ynl''0:n-l,yO:ri-l)p(rO:n-l|yO:n-l) 

However the SMC algorithm propagates approximations of p(ro:,i|yo:n). So let 

:n— 1 |yO:n— 1 

) = "'n-i'^r'. ; applying again Rubin's SIR mechanism, 

p(ro:n-^i|yo:7i) = Eili '^n-i'5rj^„_i > where 



N 

<_i oc <_ip(y„|yo:„-i,r^,„_i), = (40) 

4=1 



is an MC approximation of p(ro:„_i|yo:n)- Next from (34), the second factor 
p{x2\xi) of (38) can be rewritten as (here Af stands for numerator): 

, I » ^ P(yn|yO:n-l,r^:„_l,?'„)p(r„|r,\_i) 

P{rn\ro:n-l,yO:n) = J ■. 1 ^ TJ^ ■ (41 

P(yn|yO:n-l,rf),„_J = T,r„^ 

Note that as in section 2, p(r„|rQ.„_]^, yo:n) is the optimal conditional IS distri- 
bution, i.e. that which minimizes the conditional variance of the weights, given 
ro:«-i and yo:„. Finally, setting /(r^^„_i,r„) = E((/i(x„)|yo:„, r^^^.^, r„), the 
Bayesian CMC and crude estimators respectively read 

N 

0n(rj:^i) = 5]<_i(ri:^i)5^/(rJ,,,_i,r„)p(r„|4„_i,yo:„),(42) 

i=l r„ 
N 

e„(rj:^i,ri^^) = ^ <_i(rj:^i)/(4„_i, <), (43) 

i=l 

in which r,\ - p(r„|r^^„_i, yo:„). 

3.1.2 Computing On in practice: linear and Gaussian JMSS 

Implementing On requires that (40) and (41) are computable, and that in (42) 
the conditional expectation /(rQ.„_j^, r„) is computable too. We thus need 
to compute p(yn|yo:n-i5 ro.„„i, r„), which is not possible in general JMSS 
models. So let us now assume that the JMSS (34) is moreover linear and 
(conditionally) Gaussian: 

Rn is a discrete Markov Chain, (44) 
(i?„)X„_i + G„(i?„)U„, (45) 
(i?„)V„, (46) 

where Xg, Ui, • • • ,U„ and Vq, • • • , V„ are independent and independent of 
Ror-- ,Rn- We set Xq - AA(mo, Pq), U„ - AA(0, Q„) and V„ -^(0,R;;). 
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Then let p(x„_i|r^^„_i,yo:„-i) = A/'(x„_i ; m,\_-^|„_^; P;_-^|„_ J. Then p(y„| 

yO:ri— Ij Torn— 17 

) is given by the predicted observation mean and covariance 

of the KF, i.e. 

p(y«|yo:n-i,4„_i,r-„) -A/'(y„;y;(r„);S;(r„)), (47) 

where 

Ynirn) - y„ - H„(r„)F„(r„)mj,_i|„_i, (48) 
S;(r„) = H„(r„)Pj,|„_i(r„)H„(r„)^ + L„(r„)R;;L„(r„)^ (49) 
P;|„_i(r„) = F„(r„)Pj,_i|„_iF^(r„) + G„(r„)Q„G„(r„)^. (50) 

In summary, (47)- (50) enable to compute (40) and (41), and finally (42). 

Remark 2 Estimator On in (42) is the Bayesian CMC counterpart of On in 
(43), which itself coincides with the so-called RB SMC estimator (37) for 
JMSS [7]. Indeed, On corresponds to O^^ in (5) where xi = ro-n, X2 = xo:,i, 
/(xi,X2) = 0(x„) and gi(r„|ro:„-i) = p(r„|ro:„_i, yo:„). So (42) can be seen 
as a further RB step of an already RB SMC estimator; the RB step leading to 
(37) was a spatial one, since PF was performed on variables ro:n, rather than 
on the extended state (xo:„, ro:„); here this second RB step is temporal, since 
in (42) PF acts on ro:„_i, rather than on ro:„. So here is an example where 
we can jointly use the classical RB-PF and our CMC Bayesian technique; but 
we will see in the next section that a CMC Bayesian estimator can also be 
derived in JMSS models in which classical RB-PF is not available. 

Remark 3 One should observe that if On can be computed. On can be com- 
puted as well; so the variance reduction can be achieved under the same 
assumptions (linear and Gaussian JMSS) as those needed for the RB SMC 
estimator [7]. On the other hand this new variance reduction involves an ex- 
tra computational effort, which however is not prohibitive (at least if K is 
small), as we see from (42) and (43). First, weights in (40) have to be 

computed by both algorithms. Next for each i, 1 < « < iV, both algorithms 
compute {p(''Ti|rQ.„_i, yo:n)}^^^i. The difference is that in the CMC algorithm 
we compute directly means Y.r^ /(i"o:n-i: ^n)p(?'n|ro:n-i7 yo:ri)i which requires 
running K KF updating steps per trajectory Y\.n-i while the crude estimator 
first extends randomly each trajectory before computing conditional expecta- 
tions. 

Remark 4 Finally our Bayesian CMC estimator On stems from the RB PF 
which, itself, assumes that the JMSS model is conditionally linear with ad- 
ditive Gaussian noise. If this is not the case, but the non-linearities are not 
too severe, one can approximate E(0(x„)|yo:„, rom) by EKF or UKF, and 
next compute On from such an approximation, by using an approximation of 
p(y7i|yo:«-i,ro:„), also given by EKF or UKF. 
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3.2 Baycsian CMC algorithms for non linear JMSS models 

In this section we derive Bayesian CMC estimators in non linear JMSS models, 
in the case where, by contrast with Remark 4 above, it is not possible to 
approximate E((/)(x„)|yo:„, ro:„). In that ease we need to turn back to the first 
class of SMC methods for JMSS (see the beginning of section 3), which consists 
in propagating an SMC approximation of p(xo:„, ro:„|yo:n) [7] [29]. 

3.2.1 Deriving Bayesian CMC estimators 
Let us first rewrite On as 

— 1 ; l'0:n— 1 7 |yO:n 

)dxo:„-idx„. (51) 

ro:„-l,i-„ 

Let now J2iLi''^n-i{^0:n-iT^o-.n-i)^xl j.rj, ^ bc an MC approximation of 

p(xo:„_i,ro:„_i|yo:,i-i). Tlienp(xo:„_i,ro:„_i|yo:„) = Ejll ^«-l'5xj^„_^,rj,^„_j 
in which 

N 
i=l 

is an MC approximation of j>(xq;ii_i, rom— i|yo:n 

). Let also 

«>0 ~p(x„,r„|xjj_i,r;_i,y„). (53) 
Then the associated crude MC estimator is given by [7] [29]: 

N 

0„(xJ:^,rJ:^) = ^ 5i;_i(xi:^i, ri:^i)</'(x;), (54) 

i=l 

We now propose two Bayesian CMC estimators of On, associated to two 
different partitions of (Xo:ri, Ro:n)- Setting Xi = (Xo:„_i, Ro:n) and X2 = X„ 
leads to 0^"; setting Xi = (Xo:„_i, Ro:„-i), X2 = (X„,i?„) and /(x*„_;^,y„, 
i"ri) = / (/>(x„)p(x„|x;_i,r„,y„)dx„ leads to ©i"^"'"^"', with 

N 

"(Xoiri-li ^0:n )^^^ ""^n- 1 /(x„_l , Yn , r^) , (55) 
i=l 

r 

o^^-'''-\^l^::!_,yo\n^,)^Y.<-^T.h^^^^ 

i=l r„ •' 

N 



i-ip(ynK_i, <_i), ^ <_i = 1 (52) 



iX! K- 1 ' 1 ' Vn ) /(x^_ 1 , y„ , r„ ) , 



1=1 



in which <„i(xj;^_i, rj:^_i) is given by (52), and in (55) - p(r„|x^_i 



<-i;y«) (a marginal of (53)). Of course, 0^" = E(e„|xJ:^_i, r;^;^, yo:„) 



and = E(0Mxi:^_i,ri:^_i,yo:„), so var(0f-^"^) < var(e^) < 

var(6>„). 
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3.2.2 Computing 0^" and 0}; 



in practice 



Let us now discuss when (55) and (56) can be computed. In model (34), 

) = Er„p(^"K-i)p(y 

n 1 ^n) ^^Ild p{yn l^^^—i ? ^n— 1 



^^-uYn) ^ Pirn\rl-i)piyn\K,-i,r„)p{xn\xl^_i,r„,yn). So let 



K-iP{r )p(y 



(57) 



Er„ E»=l <-lP(?-nk;_i)p(y 

(note that ^ri-i(^n) — 'S'ri-i)- Then 0^" and ©^'''""'^"^ can be rewritten 



as 



TV 



N 

^ y: 



(x„)p(x r;)dx„, (58) 



E^n-i('"n) y 0(x„)p(x„|x^_i,y„,r„)dx„ 



(59) 



So (59) is computable as soon as (58) is computable. On the other hand, 
0^- is a generalization of (19): p(x„|x„_i, y„, r„) and p(y„|x„_i, r„) play 
the same role as p(x„|x„_i, y„) and p(y„|x„_i) in §2.1 except that we have 
now introduced a dependency in r„. This means that 0^" and 0^"'^'"^ are 
computable as soon as the Bayesian CMC estimator (19) of §2.1 is computable 
in the underlying HMC model (i.e., the HMC model to which the JMSS re- 
duces when the jumps are known), see section 2.2.1. For example, semi- linear 
stochastic models (including the ARCH ones) with Markov jumps are a class 
of models in which (58) and (59) are computable. 

Finally the only difference between (58) and (59) comes from the compu- 
tational cost that we discuss now. For a given i, in (59) the computation of 
/ 0(x„)p(x„|x5j_]^, r„, y„)dx„ has to be done for all ?■„, while in (58) it has 

only to be done for the which has been sampled. So as expected, 
is preferable to 0^" but requires an extra computational cost. On the other 
hand, comparing the computational cost of 0^" and 0„ is the same issue as 
comparing that of the Bayesian CMC estimator (19) to that of the crude MC 
one (18) in Section 2.2, and is thus problem dependent. However, one should 
observe that in the particular case described at the end of section 2.2.1, i.e. 
when sampling according to p(x„|x„_i, r„, y„) requires the computation of 
/ (/)(x„)p(x„|x„„i, r„, y„)dx„, then the computation of 0^" does not involve 
an extra computational cost as compared to that of 0„. 



3.2.3 Approximate computation 

Let us finally discuss on approximate computation of 0„ when p(y„|x„_i, r„) 
and p(x„|x„_i, y„, r„) in (59) are not available. First notice that (59) can be 
computed with the same numerical approximations as those which were used 
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in the computation of (19) (see section 2.2.2 above), except that they have to 
be done for all possible values of r„. However, r„ is discrete and as we now 
see, one can derive other approximation techniques: 

- In (34) we have p(r„, x„, y„|x„_i, r„„i) = p(r„|r„_i)/„|„_i(x„|x„_i, r„) 
3n(yTi|x,i, r„) so the numerator of (59) can be rewritten as 



TV 



E I ''ti - 1 ) - 1 /« I n- 1 (Xn I x,\_ 1 , r„ )5„ (y „ I x„ , r„ ) 



1=1 



If for a given r„ the integral is not computable, one can approximate it 
with IS by sampling x^"'' ^ g(x„|x^_]^, r„) for alH, 1 < i < and for all 
r„. An approximation of the numerator is then given by Si^i <^(^n"'*) 
p(rn|rj,_i)<_i/„|„_i(x;"'*|xj,_i,r„)5„(y„|x;;"'%r„)/g(x;-^|xj,_i,r„).So 
we do not use samples for the discrete part r„. We apply the same approx- 
imation for the denominator which can be rewritten as 12iLi ^n-i 
n—i{^n\fm xjj_ ) (7„ (y„ |x„ , r„)dx„; 
— When the optimal distribution p(x„, r„|x„_i, r„_i, y„) is not available, it 
has been proposed [7] to sample independently (x^, r^) according to an im- 
portance distribution 5(x„, r„|x5,_i, r^.^) = g(r„|x^_i, rj,_i)g(x„|x^_i, 
^ri,f^-i), for all i, 1 < i < N, then to compute the estimator 6*^^^ = 



member that the Bayesian CMC estimator 0„ is actually the expectation of 
the crude MC estimator On given some variables. One can wonder if it is not 
possible to compute E(6'^^'^|{r^_;^,x5j_;^,x^}^j,y„), i.e. to compute the 
expectation of 0f^^ as a function of r„. Since </>(.) does not depend on r„, 
it is equivalent to compute the conditional expectation of the unnormalized 
weights and so to reduce their variance. Unfortunately this is not pos- 
sible because of the normalization factor. However one can compute sepa- 
rately the conditional expectation of the numerator and that of the denom- 
inator. This is an easy task since r„ takes its values in a discrete set, and 
9(7'„|x„,x„_i, r„_i)=g(r„|x„_i, r„_i)(7(x„|x„_i,r„, r„_i)/X]r„9(''™|xn-i, 
r„_i)(3'(x„|x„_i, r„, r„_i). This variance reduction of the unnormalized 
weights comes from a normalized IS implementation of (59) which is rewrit- 
ten as 

)5n(y«| ^n)] ^.X^^ 

)] dx„ 
(60) 

with the importance distribution 
9(x„| 
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Note that the computation of the new weights is not prohibitive as long as 
K « N. 



3.3 Simulations 



We now test our approach in a linear and Gaussian JMSS model, described 
by equations (44)-(46) in which 



F„(r) 



1 



sin(cJrT) 







cos(a;rT) 



I-cos(l^^T) 



- sin(a;rT) 

sin(tJr7') 



sin(a;rr) cos(aJrT) 



Qr 







' rj-i3 rp'2 

^ T 

^ ^ 

^ T 



10 
10 



jI 

-I 



and G„(r) = I, T = 2s, Oy ~ im? / sec' and 



= o'y = 10m. We track a maneuvering target described by its position and 
velocity in the Cartesian coordinates, x„ = [px,Px,Py,Py]n- Mode r„ repre- 
sents the behavior of the target: straight, left turn and right turn. Remember 
from 3.1.2 that the computation of On involves an extra computational cost 
compared to that of On ■ So we compute the efficiency over P = 200 simulations 
defined as [30] 



Eff(n) 



1 



MSE(n)E(C(n)) ' 



(61) 



where C{n) is the CPU time to compute the estimator, and we discuss the per- 
formances of On and On in function of the model parameters. Both estimators 
are computed with N = 1000 particles. 



We first set wi = rad.s 



37r/180 rad.s andwa = — 37r/180 rad.s 



The Markovian transition probability is p(Tn|''n-i) = 0.4 if r„ = r„_i and 
pi^nlfn-i) = 0.3 otherwise. In Figure 6, we display the (averaged) efficiency 
of both estimators over time. The efficiency of On is greater than that of On , so 
the Bayesian CMC estimator for linear JMSS is of practical interest. Note that 
the dependency of the model in {r„} is weak since Wr is small and the Marko- 
vian transition probabilities are close. So distributions {p{rn\^o;n-nyo:n)}iLi 
tend to be uniform, and remember that^6>„ computes directly the expectations 
according to these distributions while On uses samples r^'^ according to them. 
This is why the gap between both estimators gets larger when distributions 
{p(r„|rQ.„_2, yo:n)}ili become almost uniform. 

Next we increase the dependency of the model in by setting uJi = 

rad.s"^, uj2 = Stt/ISO rad.s"^ and 0J3 = —Sir/ISO rad.s""^. We also set 
p{rk\rk-i) = 0.6 if rk-i and p{rk\rk-i) = 0.2 otherwise. In Figure 7, 

we display the averaged efficiency of both estimators for this new set of pa- 
rameters. Indeed, the gap between both estimators is reduced but On still 
outperforms On- 
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Fig. 6 Efficiency - Linear JMSS Model - Close Markovian transition probabilities - 4>{xn) = 
x„. Due to the weak dependency of the model in r„, it is dangerous to sample new particles 
(Crude SIR) before computing the estimator. 




Fig. 7 Efficiency - Linear JMSS Model - Dispersed Markovian transition probabilities - 
0(x„) = x„. Contrary to the previous simulation, distributions {p{rn\^li.„_-i,yo:n)}fLi are 
dispersed so the gap between the crude SIR estimator and the CMC SIR one shrinks. 

4 Bayesian CMC algorithms for Multi- Target filtering 

In this final section we apply CMC to multi-target filtering. Some adaptations 
are necessary, because in the multi-target context we do not necessary deal 
with classical pdf. However, the discussion in section 1.3 still holds, as we 
shall see. Let us begin with a brief review of multi-object filtering. 



4.1 A brief review of Random Finite Sets (RFS) based multi-target filtering 

Multi-object filtering extends the previous problem in the sense that we now 
look for estimating an unknown number of targets from a set of observations 
which are either due to detected targets or are false alarms measurements. 
Classical solutions such as the Joint Probabilist Data association filter [31] or 
the Multiple Hypothesis Tracker [32] include a matching mechanism between 
targets and observations. Alternate solutions are based on RFS, which are sets 
of random variables with random and time- varying cardinal (see e.g. [33]). The 
interest of RFS based techniques over classical solutions is that they no longer 
require such a matching mechanism. The RFS formulation was first used to 
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derive the multi-object Bayesian filter, which generalizes the classical single 
object one [3]. This multi-object Bayesian filter involves the computation of set 
integrals of multi-object densities, i.e. of positive functions f{X) of a given RFS 
X, and cannot be computed in practice (SMC approximations can however 
be of interest when the number of targets is small [34]). Later on, Mahler 
proposed to propagate a first order moment of the multi-object density, the 
so-called PHD or intensity [3]. Let |Xn 5*1 be the number of objects in RFS X 
which belong to region S; then the PHD w(x) is defined as the spatial density 
of the expected number of targets, i.e. 

w(x)dx = E(|Xn S"!). (62) 

S CIRP 

Its interest in multi-object filtering is twofold; first, / w(x)dx is an estimate 
of the number of targets; in addition, extracting the states consists in looking 
for regions where the PHD is high, and so local maxima of v are required. 
Let now t'„(x) be the a posteriori PHD, i.e. the first order moment Wri(x) 
of the multi-object density at time n, given the set of past measurements 
Za-.n = {Zi, ■ ■ ■ ,Z„}, where Zk is the set of measurements available at time 
k. The PHD filter is a set of equations which enables to propagate w„ and 
which has the advantage to make use of classical integrals only. If we assume 
that the cardinality distributions of the number of targets and of false alarm 
measurements are Poisson, and that each target evolves and generates obser- 
vations independently of one another, then PHD u„ is propagated as follows 
(we assume for simplicity that there is no spawning) [3] [33]: 



Vn\ 71— 1 

(x„) )v„_i(x„_i)dx„„i +7«(xn),(63) 

Wn(x„) = [1 -Pd,n(x„)] W„l„-l(x„) 

(x„) 



E 



(Xn) dx 

n 



(64) 



where Ps.n{-) (resp. Pd.n{-)) is the probability of survival (resp. of detection) at 
time n which can depend on state x„_i (resp. on x„); and k„(.) (resp. 7n(.)) 
is the intensity of the false alarms measurements (resp. of the birth targets) 
at time n. 



4.2 Deriving the Bayesian CMC PHD estimator 

Let us now turn back to the derivation of a Bayesian CMC PHD estimator. 
First, the problem we address is to compute the moment On ~ J /(x„)t;„(x„)dXi 
(typically, wc shall take either /(x„) = 1 or f{:x-n) = Is(xn), where S is some 
region of interest). From now on we assume that pd,n docs not depend on x„. 
Plugging (63) in (64), the PHD at time n can be written as 

4 

Vn (x„ ) = ^ Vn (x„ ) , (65) 

i=l 
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where 

vl{Xn) = [1-Pd,n] y Ps,ji(x„-l)/„l„_i(x„|x„_i)u„_i(x„_i)dx„_i, (66) 

vli^n) = [1 - Pd,n] 7«(Xn), (67) 
■?/ N V^Pd.nffr,(2i|x„)/ps.ri(x„-l)/„|„_i(x„|x„_i)u„_i(x„_i)dx„_i 

-nM=i:- ^ ^) ' (68) 

.,1(x„) = E (69) 
and where 

i3„(z) = Ac„(z)+Bi(z) + B2(z), (70) 

Pd.nQn 

(z|x„) / )fn\ )dx„_idx„ 

i)p(z|x„_i)ii„_i(x„_i)dx„_i, (71) 

-Sn(z) = y"pd,n5n(z|x„)7„(x„)dx„. (72) 

Term (resp. w^) is due to non-detected persistent (resp. birth) targets, while 
v'^ (resp. vf-^) is due to detected persistent (resp. birth) targets. 
From (65) we see that 

i=l C ^ 



so we now consider whether one can adapt the Baycsian CMC methodology of 
section 1.3 to any of the moments 01^. First, note that z;^(x„) and w^(x„) do 
not depend on w„_i(x„_i) so we use a crude MC procedure to compute 0^ and 
0^. Let Vn-i = J2i=i^ ^n-i^xj^_j and 7„ = I]fji MC approxima- 

tions of -y„_i(x„_i) and of 7„(x„), respectively. Then §1 = J2f=i ^I'^fixl^J, 
Ot = Eti Ezez„ ^tH^)] /(x^J, where wl'^ = wl^^ [I - Pd,n] , w^'^iz) = 

"^7„ B„(.) and 

B„(z)=K„(z)+pd,«X!K>.5"(x!).„|z)+pd,„ Eu;,\_iPs,„(x^_i)p(z|x^_i). (74) 

i=l i=l 

By contrast, the computation of u^(x„) and of ^^(xn) depends on ?;„_i(x„_i). 
This suggests adapting the common methodology described in section 1, even 
though the PHD is not a pdf (it is a positive function, but remember from (62) 
that its integral is not equal to 1), and that weights {wl^_i}.^^i^ niay depend 
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on variables different from x^i^ ^ , but which are known at time n~ 1. These 
differences do not impact the discussion of section 1.3 which can be used in 
this context. Indeed, we have /„|„_i(x„)5„(z|x„) = p(x„|x„_i, z)p(z|x„_i), 
so 0^ and 0"^ can be rewritten as 

Ol=[l-pd,n] / E(/(x„)|x„_i) X [ps,„(x„_i)i>„_i(x„_i)] dx„_i, (75) 



^'=E/e(/(^«)|x„-i,z) 



(x„_i)p(z|x„_i)v„_i(x„_i) 
S„(z) 



dx„_{.76) 



Let us start with the computation of 0^^ in (75). Even if is not a pdf, the fac- 
tor ps,n(x„_i)t'n-i(xn-i) within brackets plays the role of ^(xi) in (7), and can 
be approximated by J2f=i^ wi''S^^^_^ where wi'' = [1 - Pd.n] PsAK-i)Wn-i- 
So the crude MC and Bayesian CMC estimators of 6?^ are respectively 

i=l 

Oi.n ^ ^«'"E(/(x„)|xj,_i), (78) 

1=1 

in which x^ ~ fn\n-i{'^n\^n-i)- Let us next address 0^ in (76). For each mea- 
surement z g Z„. the factor within brackets 
plays the role of p{xi) in (7), and can be approximated by Yld^i^ ^n^i^)^^^ ^ 

where wi^'*(z) = pd^„p5^„(x;_i)p(z|x^_;^)w,\_i/B„(z). So the crude MC and 
Bayesian CMC estimators of 0f^ are respectively 

z i=l 

^l=Y.Y. ^«?/(z)E(/(x„)|xj,_i,z), (80) 

in which x^'' ~ p(x„|x5j_i, z). 

In summary, the crude MC PHD estimator 0n of 0n is the sum of four 
crude MC estimators: 0n = X]i=i ^tu while our Bayesian CMC PHD es- 
timator 0n is a sum of two crude MC and two Bayesian CMC estimators: 
0n = 0\ + 0n + 6'n + ^'n- Siuce 6*^ and 0f^ are computed from the same 
MC approximation of v„_i(x„_i), 0„ = E(e„|{x^_i}f^f \ {xi^^jfjj" , Z„), so 
section 1 enables to conclude that 0n indeed outperforms 0„. 

Remark 5 The computation of 6*^+6'^ involves to sample Ln~i{\Z„ \+l) parti- 
cles where |Z„| is the cardinal of Z„. It is possible to compute an approximation 
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with L„_i particles by sampling ~ q^ix) with q'^{x) oc it;^'*/„|„_i(x„|x^_]^)+ 
i(jj^''(z)p(x„|x'j_]^, z) for all i. 1 < i < L„_i, and by taking 



E 



If, 



3,'i 



fiO- 



(81) 



Remark 6 Depending on the form of 7n(x„) and (/„(z|x„), J g„(z|x„)7„(x„)dx„ 
may be directly computable, so B'^{z), 6*2, n and 6*4, „ may be computable too. 
In this case one can replace i?„(z) in (74) by k„(z) + B^ {z) +pd,n J2 



4.3 Computing the CMC PHD filter On in practice 

In the multi-target filter problem, we look for computing an estimator of the 
number of targets and of multi-target states. From (62), an estimator of the 
number of targets is given by 

iV„ = Yl + E E ''(^) + E ' + E E ^nH^- (82) 

1=1 zez„ 4=1 1=1 zez„ 4=1 

The procedure to extract persistent targets consists in looking for local max- 
ima of J2f=i' K''Pi^n\K~i) + Ezez„ Ef=r' '«?/(z)p(x„|xj,„i, z). For birth 
targets, this procedure cannot be used if the PHD due to birth targets was 
computed via an MC approximation. One can use clustering techniques [34], 
or the procedure described in [35] , which consists in looking for measurements 
z such that X^i^i w^'^^z) is above a given threshold (typically 0.5); then an 
estimator of the state associated to z is given by J2i=i ^ra*(^)^7„- However, 
birth targets become persistent targets at the next time step; so their extrac- 
tion becomes easy at the next iteration since an SMC extraction procedure 
can be avoided. 

Remark 7 One can also adapt the procedure described above [35] to the ex- 
traction of persistent target states, i.e. looking for measurements z such that 
J2i=i^ w^'*(z) is above a given threshold, and estimating the associated state 
by X]i=i~^ ^n'H^) / XnJ'(x„|x^_;^, z)dx„. The advantage of this procedure is 
that we just need to compute / x„p(x„|x*j_j^, z)dx„ for such measurements. 

Let us now detail some applications of the CMC-PHD filter. 

4-3.1 Gaussian and linear models with Gaussian Mixture (GM) birth 
intensity: an alternative to the GM implementation of the PHD filter 

We first assume that /„|„_i(x„|x„_i) = 7V(x„; F„x„_i; Q„), g„(z|x„) = 
7V(z; H„x„; R„), and that 7„ is a GM, i.e. that 7„(x„) = X^ilT w^J^i^n, ; 
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Ply^)- For such models a GM implementation has been proposed [36], which 
consists in propagating a GM approximation of PHD Vn via (63)- (64). The 
mixture grows exponentially due to the summation on the set of measurements 
in (64), so pruning and merging approximations are necessary. In addition, this 
implementation requires that pd,n andps.n are constant (or possibly GM [36]). 
In our algorithm we do not need to make any assumption about Ps(x„_i). For 
this model B^{z) is directly computable, and the Bayesian CMC procedure 
for estimating the number of targets and extracting the states is valid since 
p(x„|x„_i,z) andp(z|x„_i) are computable (see (20)-(21) and (25)-(26)). Fi- 
nally, in the case where Ps(xn-i) is constant, we have at our disposal three 
implementations of the PHD filter: the GM [36], the SMC [34] and our Bayesian 
CMC implementations which will be compared in section 4.4 below. 

4-3.2 Gaussian and linear models with ordinary birth intensity 

If 7„ is not a GM the GM implementation cannot be used any longer. However, 
our method remains valid if we compute 02,n, 6)4, n and B^^{z) via an MC 
approximation. By constrast to the pure SMC technique, our Bayesian CMC 
implementation enables to keep the GM structure for persistent targets. 

4-3.3 Nan linear models 

In a non-linear model the GM implementation cannot be used any longer. The 
extended (resp. unscented) Kalman PHD filter [36] approximates the PHD by 
a GM, the parameters of which are propagated by an EKF (resp. UKF). By 
contrast, we propose to adapt our Bayesian CMC implementation, by approxi- 
mating p(x„|x„_i, z) and p(z|x„_i) at time n by techniques described in §2.2. 
The main difference is that we start from a discrete approximation of the PHD 
at time n — 1, and compute an estimate of the states without using clustering 
techniques of the MC implementation. This way we get an approximation of 
the PHD which does not rely on a numerical approximation at time 71 — 1 
and which enables to extract the states easily. In addition, by contrast to the 
extended and unscented implementations of the PHD filter, numerical approx- 
imations are not propagated over time since they are only used locally for the 
extraction of states. 

4.4 Simulations 

We now compare our Bayesian CMC PHD estimator to alternative implemen- 
tations of the PHD filter. The MSE criterion used previously is not appropriate 
in the multi-target context: since the number of targets evolves, a performances 
criterion should take into account an estimator of the number of targets and 
an estimator of their states. So in this section we will use the optimal subpat- 
tern assignment (OSPA) distance [37], which is a classical tool for comparing 
multi-target filtering algorithms. Let X = {xi, and Y = {2/1, i/n} 
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and g„(z„|x„) = A/'(z„; H„x„, R„), where F„ 



and the other pa- 



be two finite sets, which respectively represent the estimated and true sets of 
targets. For 1 < p < +00 and c > 0, let S'^\x,y) = min{c,\\x — y\\) (||.|| is 
the euclidean norm) and let 77„ be the set of permutations on {1, 2, n}. The 
OSPA metric is defined by : 

dl{X,Y)^ l^min fjdW(x,,y^(,))P + cP(n-,n)^^ ' (83) 

if TO < n, and d {X^ Y) ~ dAY, X) if to > n. The term min ^ d^''\xi^ 2/7r(i))^ 

represents the localization error, while the second term represents the cardi- 
nality error. 

We focus on the linear and Gaussian model in which the GM-PHD is used 
as a benchmark solution and enables to appreciate the performance of our 
Bayesian CMC-PHD fihcr. So we compare the GM-PHD, the SMC-PHD and 
our Bayesian CMC-PHD filters. We track the position and velocity of the tar- 
gets so x„ = [px,Vx,Py,Py\n- Let also /„|„_ i (x„ | x„_i) =7V^(x„;F„x„_i,Q„) 

[1 T 0' 
10 
1 T 
1 

rameters (H„, Q„ and R„) arc identical to those of §3.3. 

We compare the SMC-PHD and our Bayesian CMC filters in the case 
where both algorithms use the transition pdf /„|„_i(x„|x„_i) (remember that 
in our approach, we need to propagate a discrete approximation of the PHD, 
even if it not used for computing an estimator of the number of targets). 
We take T = 2s, ct„ = 3TO^/sec'^ but Ox = <7y = 0.3to, which means that the 
likelihood 5„(z|x„) is sharp; since the transition pdf does not take into account 
available measurements, it is difficult to guide particles into promising regions, 
so this experimental scenario is challenging for the SMC-PHD implementation. 
Particles are initialized around the measurements [35] . In both algorithms we 
use Nb = 20 particles per newborn target and N = 200 particles per persistent 
target. The probability of detection is pd.k = 0.95 and that of survival ps_k = 
0.98, for all fc, 1 < fc < 100, and we generate 10 false alarm measurements (in 
mean). We consider a scenario with 6 targets which appear either at A: = 0, 
= 20 or fc = 50. We also test the CM implementation in which Tp = 10^^ 
for the pruning threshold, T„i = 4to for the merging threshold and we keep at 
most A'niax = 100 Gaussians (see §4.3.1). 

The OSPA distance and estimated number of targets are displayed in Fig- 
ures 8 and 9. The Bayesian CMC approach outperforms the SMC one and 
copes with the issue of guiding particles in promising regions. Even if we 
use the transition density for getting a discrete approximation of Vn-i, the 
Bayesian CMC approach provides a correct estimate of the number of targets, 
by contrast to the SMC one in which the new set {xj,, is used to de- 

duce a discrete approximation of u„, then an estimate of the number of targets. 
The Bayesian CMC PHD estimator also outperforms the CM one in terms of 
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OSPA distance. Finally the number of targets is well estimated both by the 
GM and Bayesian CMC implementations, but the Bayesian CMC estimator is 
more accurate, see Figure 10. 




Fig. 9 Estimator of the number of targets for linear and Gaussian scenario. 



5 Conclusion 



In this paper we adapted CMC to single- and multi-object Bayesian filter- 
ing. In this framework, the recursive nature of SMC algorithms provides a 
conditioning variable at each time instant, but i.i.d. samples from this condi- 
tioning variable are unavailable. Our variance reduction method can be seen 
as a temporal, rather than spatial, RB-PF procedure; a Bayesian CMC esti- 
mator is ensured to outperform the associated crude MC one whatever the 
number of particles. We next showed that a CMC estimator can indeed be 
computed, or approximated, in a variety of Markovian stochastic models, in- 
cluding semi-linear HMC or JMSS, either at the same cost or at a reasonable 
extra computational cost. Finally we adapted Bayesian CMC to multi-target 
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Fig. 10 Standard Deviation of the estimator of the number of targets for linear and Gaus- 
sian scenario - The CMC estimator is slightly more reliable than the GM one when time 
increases. 

filtering, and showed that our CMC PHD filter has interesting practical fea- 
tures as compared to alternate (SMC or GM) implementations of the PHD 
filter. Our analysis was validated via simulations. 
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